The Index Distribution of Gaussian Random Matrices 
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We compute analytically, for large TV, the probability distribution of the number of positive eigenvalues 
(the index 7V+) of a random N x N matrix belonging to Gaussian orthogonal (/3 = 1), unitary (fi = 2) or 
symplectic (/3 = 4) ensembles. The distribution of the fraction of positive eigenvalues c = N+ /N scales, 
for large N, as 7(c, N) ~ exp [— /3iV 2< 3>(c)] where the rate function <£>(<:), symmetric around c = 1/2 and 
universal (independent of /3), is calculated exactly. The distribution has non-Gaussian tails, but even near its 
peak at c = 1/2 it is not strictly Gaussian due to an unusual logarithmic singularity in the rate function. 
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Random matrix theory (RMT) has played a central role in 
various branches of physics since its inception Jl]]. Through 
the years, different, seemingly unrelated problems in physics 
and mathematics have been linked via RMT. It is not surpris- 
ing then that the distributions of observables associated with 
random matrices play a very important role in a variety of 
physical contexts. Still, after more than half a century, certain 
natural questions about eigenvalue distributions have eluded a 
thorough treatment, in spite of their relevance to a broad range 
of subjects. 

As an example, classical disordered systems offer the ideal 
environment where RMT ideas and tools may be applied. 
Physical systems such as liquids and spin glasses are known 
to exhibit a rich energy or free energy landscape characterized 
by many extrema (minima, maxima and saddles) and rather 
complex stability patterns [2] which play an important role 
both in statics and dynamics of such systems. The stability 
of a stationary point of an A^-dimensional potential landscape 
V(x\ , X2, ■ ■ ■ , xn) is decided by the N real eigenvalues of the 
Hessian matrix fly = [d 2 V / dx.idxj] which is evidently sym- 
metric. If all N eigenvalues are positive (negative), the sta- 
tionary point is a local minimum (local maximum). If some, 
but not all, are positive it is a saddle. The number of positive 
eigenvalues < N + < N, called the index, is a key object of 
interest as it determines the number of directions in which a 
stationary point is stable. 

In many situations, important insights about the system 
can be gained by simply assuming that the Hessian is a real 
symmetric random matrix drawn from a Gaussian ensemble: 
Gaussian orthogonal ensemble characterized by the Dyson in- 
dex j3 = 1. This random Hessian model (RHM) has been 
studied extensively in the context of disordered systems yfl, 
landscape based string theory 0] and quantum cosmol- 
ogy (6|]. In RHM, the fraction of positive eigenvalues c = 
N + /N is a random variable whose distribution 7(c, N) is our 
main object of interest in this Letter. Although in RHM (3 — 1, 
it is also of interest to study the index distribution for other 
Gaussian ensembles namely the unitary (J3 — 2) and the sym- 
plectic ({3 — 4). In this Letter, we study the index distribution 
for general > 0. 



Due to the Gaussian symmetry of the ensemble, it is clear 
that on average half of the eigenvalues are positive (or nega- 
tive), implying (c) = 1/2. Thus, the distribution 7(c, N) = 
7(1 — c, N) must be symmetric around c = 1/2 with a peak at 
c = 1/2. Cavagna et al. studied 7(c, AT), for (3 — 1, using the 
replica method and some additional approximations [3]. They 
argued that for large N the distribution has a Gaussian peak 
around its mean 1 3 1 



T(c, N) ps exp 



TT 2 N 2 

2\n(N) 



(1) 



indicating that the variance ((c — 1/2) 2 ) ~ In (AT) / 'tt 2 ] N 2 , or 
equivalently ((N + - N/2) 2 ) pa \n(N)/n 2 for large N. In 
the opposite limit, near the tail c = 1 (or equivalently c = 0) 
y(l, AT), the probability that all eigenvalues are positive, was 
recently computed exactly for large N and for all (3 \M, 



7(1, N) ps exp [-(30N 2 



Z In(3). 



(2) 



It is then natural to ask how does the distribution behave in 
between, i.e., for < c < 1. In particular, the presence of 
In (AT) term in (Q]i presents a challenging puzzle: how does one 
smoothly interpolate the distribution between the two limits, 
the peak and the tails? 

In this Letter we resolve this outstanding puzzle by com- 
puting the distribution CP(c, AT) exactly for large A^ in the full 
range < c < 1 and for all (3. Let us summarize our main 
results. We show that to leading order for large N, 



?(c, A") ps exp [-(3N 2 $(c)] 



(3) 



where ps indicates liniiv^oo - In [^(c, N)] /((3N 2 ) = $(c). 
The exact rate function $(c), symmetric around c = 1/2 and 
universal (independent of (3), is given in (fl~8T > for 1/2 < c < 1 
and is plotted in Fig. ([3]). The fact that the logarithm of the 
probability ~ 0(N 2 ) for fixed c is quite natural, as it repre- 
sents the free energy of an associated Coloumb fluid of A^ 
charges (eigenvalues). The Coulomb energy of A^ charges 
clearly scales as ~ 0(N 2 ). In the limit c — * 1, we get 
$(1) = 6 = ln(3)/4 in agreement with (0. The distribution 



2 



is thus highly non-Gaussian near its tails. In the opposite limit 
c — > 1/2, we find a marginally quadratic behavior, modulated 
by an unusual logarithmic singularity 



$(c) 



1/2) 2 



2 ln(c- 1/2)' 



(4) 



The variance computed from our exact formula, ((N + — 
N/2) 2 ) w ln(iV)//?7r 2 for large TV perfectly agrees, for 
/3 = 1, with Ref. [3]. However, the distribution of N+ near its 
peak is not a Gaussian with variance ~ ln(iV) as claimed in 
y|], rather our exact result shows that the origin of the ln(iV) 
term is due to the logarithmic singularity associated with the 
rate function $(c) itself near c = 1/2. In addition to obtain- 
ing the full distribution T(c, N) thus solving this challeng- 
ing puzzle, our Coloumb gas approach also provides a new 
method of finding solutions to singular integral equation with 
two disjoint supports. This method is rather general and can 
be fruitfully applied to other related problems in RMT 

Our starting point is the joint distribution of N eigenval- 
ues of Gaussian random matrices parametrized by the Dyson 
index j3 £0] 



P(Xi 



n ^ 

l<j<k<N 



Xkf (5) 



where the normalization constant can be computed using 
the celebrated Selberg's integral. For large N, it is known 
that |H, Z N ~ exp [-^o^ 2 ] with fl Q = (3 +2m2)/8. 

The distribution P(N + ,N) of the index, i.e., the number 
of positive eigenvalues can be expressed in terms of the joint 
distribution 



d N +A 



*+A, 



r£({A,}) 



(6) 

where the integrals are restricted over configurations with only 
N+ positive eigenvalues and the binomial counts the differ- 
ent relabellings of the eigenvalues. The function E({Xi}) = 
J2i X 2 — J2j^k m \ Xj — Xk \ can be interpreted as the energy 
of a configuration of charged particles located at {A^} on the 
real line and P(N + , N) is the partition function of this fluid 
at inverse temperature f3/2. The distribution of the fraction 
c = N+/N is then simply T(c, N) = P(cN, N). 

In this Coulomb gas picture, N + = cN of the total N 
charges are confined to the positive real semiaxis. The charges 
repel each other via the 2-d Coulomb interaction (logarith- 
mic) and are also subject to an external confining potential 



(parabolic). As a result of these two competing energy scales, 
it is easy to see that typically A ~ 0(s/N) for large N 10]. 
The evaluation of such partition function in the large N limit 
is carried out in two steps [7]: i) a coarse-graining protocol, 
where one sums over all microscopic arrangements of Ai com- 
patible with a fixed and normalized (to unity) charge density 
function p(A, N) = N^ 1 J^. <5(A — Aj), and ii) a functional 
integral over all possible normalized charge density functions, 
upon using the scaling p(\,N) ~ A r_1 / 2 p(AA^ -1 / 2 ) where 
the scaling function p(x) satisfies p{x)dx = 1. 

The resulting functional integral over p(x) is then evalu- 
ated in the large N limit via a saddle point method. In phys- 
ical terms, this amounts to finding the equilibrium density of 
the fluid (minimizing its free energy) under the competing 
interactions (Coulomb repulsion and quadratic confinement) 
and the external constraint (N + = cN particles kept always 
on the positive semiaxis). This constrained Coulomb gas ap- 
proach has proven useful in a number of different contexts 
such as Gaussian and Wishart extreme eigenvalues |7j, IsJ, |9[], 
nonintersecting Brownian interfaces jlOll . quantum transport 
in chaotic cavities II ill , statistics of critical points in Gaussian 



landscapes 111 21 11311 . bipartite entanglement 111411 and also in 



information and communication systems JTll . 
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FIG. 1: The density of eigenvalues p*(x) (eq. (TT)) for c = 1/2 
(red), 3/4 (green) and 0.995 (blue). 



Using the above approach one gets, to leading order in large 



N, 



y(c,N)oc / T>[p}e~% N2s " lp] 



(7) 



with the action S c [p] given by: 



/oc 
dx x 2 p(x) — 
-co 



oo poo 



oc J — oo 



dxdx p(x)p{x) In \x — x'\+A\ 

I 



dx9{x)p(x) - c +A 2 / dxp(x) - 1 (8) 



where A\ and A 2 are Lagrange multipliers enforcing the frac- tion c of positive eigenvalues and the normalization of p, and 
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9(x) is the Heaviside step function. 

The equilibrium fluid density p*(x), which minimizes the 
action or the free energy, is obtained by the saddle point equa- 
tion SS c [p]/Sp — 0, resulting in the integral equation 



x 2 +A 1 9(x) + A 2 



p*(y)ln\x-y\dy (9) 



By taking one derivative with respect of x, we obtain, for x ^ 

0, 



x = Pr / dy 



p*(.y) _ 

x-y' 



(10) 



where Pr stands for the Cauchy's principal part. It turns 
out that there exists a closed formula, due to Tricomi flfSll . 
for the solution of such integral equations provided the so- 
lution has a single support. This is indeed the case in the 
two limiting situations c = 1/2 and c = 1. For c = 1/2, 
the solution is given by the celebrated Wigner's semicircle, 
p*(x) = i-v/2 - x 2 with -a/2 < x < \/2. In the oppo- 
site limit c = 1, all the eigenvalues are on the positive side 
and one again obtains a single support solution p*(x) = 

{2tt)- 1 ^J^/8/3- x)/x y^8/3 + 2x for < x < ,j8/3. 

In contrast, for 1/2 < c < 1, the solution generally con- 
sists of two disjoint supports: a blob of (1 — c)N negative 
eigenvalues and a blob of cN positive eigenvalues (see Fig. 
[TJ. Finding this two-support solution thus poses the principal 
technical challenge for 1/2 < c < 1. We have succeeded in 
finding this two-support solution exactly by iterating the Tri- 
comi formula for single-support solution twice, the details of 
which will be reported elsewhere 111 711 . Our main result is that 
for all 1/2 < c < 1, 



p*(x) = -\ (£ X) (x + L/a) (x + (1 - l/a) L) (11) 

7T V X 

where a, L parametrize the support of the solution x S 
[-L/a,-L(l - l/a)] U [0,L] where p*(x) > 0. They are 
implicitly given as functions of c by the equations: 



dy 
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and 



1-2/ 



v 2 + y + 



L = 



(o-l) 



TIC 

~2 



= ^ 1- 



(o-l) 



Va 2 



(13) 



It is easy to check that one recovers the correct single support 
solutions in the limiting situations c = 1/2 and c = 1. The 
density p*(x) for different c is given in Fig. Q] We have also 



verified this analytical prediction numerically by two different 
methods (see Fig. |2): direct diagonalization of small matrices 
and also by Monte Carlo simulation of the Coloumb gas lfl7ll . 

Using this saddle point solution in ©, we get 

3 (c, N) w exp(— /3iV 2 $(c)), where the rate function $(c) = 




FIG. 2: Analytical density p*(x) Jilt for c = 0.6 (solid black) to- 
gether with results from i) (red) numerical diagonalization of 10 6 
matrices of size 20 x 20, where only samples having 12 positive 
eigenvalues were retained for the statistics (c = 0.6), and ii) (blue) 
Montecarlo simulations of the Coulomb fluid with N = 50 particles. 



(l/2)S c [p*} -f2 ,^o = (3 + 2 In 2)/8 coming from the nor- 
malization Zff. To evaluate the saddle point action S c [p*] we 
next need to evaluate the single and double integral over p*{x) 
in ([8]). The double integral can be written in terms of a simple 
integral after multiplying (O by p* and then integrating over 
x. This gives 



*(c) 



3 In 2 1 . a . 1 . 1 . 
— + -(x 2 ) Aic A 2 , (14) 

8 4 4 x/ 4 4 ' 



where the average (•) is done with the measure p*{x). 

The Lagrange multipliersAi and A 2 can be obtained from 
(0 upon setting x = L > and x = —L/a < 

L 2 + Ai + A 2 = 2{\n{L-x)), (15) 
L 2 /a 2 +A 2 = 2{\n(x + L/a)). (16) 

It turns out that the averages on the right hand side can be 
simplified by first introducing a pair of functions 



(12) W c±) (aj)=a;- 




(xtL) 



L 

x±- 
a 



x±[l 



in 

a 



x y x 

(17) 

defined respectively for x > L and x > L/a. One can 
show [17] that they are essentially the moment generating 
functions of p*(x). The averages in (T5[ and ( fl~6b can then 
be expressed as simple integrals over W(±)(x). Skipping de- 
tails II 1711 . we obtain the following explicit expression for the 
rate function for 1/2 < c < 1, 



$(c) = -[L 2 -l-ln(2L 2 )] 



(1 



ln(a) - 



(l-c)(a 2 -l) 
4a 2 



L 



W( + )(x)dx - 



(1 



L/a 



W ( _ ) {x)dx. (18) 
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near c = 1/2. 

We also remark that for j3 = 2 it is possible to find an exact 
formula for the variance at finite N 11711 based on Andrej- 
eff formula and/or orthogonal polynomials with discontinu- 
ous weights 11811 . Using this formula we have evaluated the 
variance for all finite TV and found that the leading growth is 
precisely A(TV) ~ ln(TV)/27r 2 in agreement with the asymp- 
totic result in $1% , they differ only for subleading terms in TV 
(see Fig. [3](bottom)). 

The work presented here can be generalized in several di- 
rections. Our method to find explicitly two-support solutions 
to singular integral equation is quite general and can be ap- 
plied to other problems. For example, one can compute the 
distribution of the number of eigenvalues bigger than a fixed 
value z. This is particularly relevant for Wishart matrices that 
play an important role in multivariate data analysis |9J]. Our 
method can also be applied to more exotic ensembles of ran- 
dom matrices that arise in connection with 2-d quantum grav- 
ity Il9ll . It would also be interesting to investigate if the log- 
arithmic growth of the variance of the index is universal and 
holds even for non-Gaussian ensembles. 



FIG. 3: (Top) The large deviation function $(c); (Bottom) the vari- 
ance of the index as a function of ln(TV) for f3 = 2 (dotted, exact 
finite TV" formula; solid, large TV). A linear fit for the former gives 
A(TV) ~ 0.176 + 0.052 In TV with the prefactor 0.052 in good agree- 
ment with the leading theoretical prefactor (2iv 2 )~ 1 ~ 0.051 for the 
large TV result. 

Eq. ( fT8l is the principal result of this Letter. It is again 
easy to check that in the two limits c = 1/2 and c = 1, one 
recovers correctly $(1/2) = and $(1) = (ln3)/4. For 
arbitrary c, the integrals have to be evaluated numerically. The 
rate function $(c) is plotted for < c < 1 in the top panel 
of Figure[3] It is symmetric around c = 1/2 with a minimum 
at c = 1/2 and grows monotonically from $(1/2) = to 
$(1) = (ln3)/4. To see how $(c) behaves near its minimum 
c = 1/2, we make a perturbation expansion of ( fT8l setting 
c = 1/2 + 5 with S > small. Since for c = 1/2, a = 1, 
we first expand ( fT2l setting a = 1 + a. To leading order we 
get a w 7r<5/ ln(l/<5). Inserting this result in ([TBI followed 
by straightforward expansion gives Using this expression 
of $(c) in (01, one can then easily compute the variance of 
TV+ = cTV 

A(TV) = TV 2 ((c - 1/2) 2 ) ~ lniV+ 0(1) (19) 

which, for = 1, agrees with the asymptotic result in 
However, the logarithmic growth of the variance is evidently 
due to the logarithmic singularity in the rate function $(c) 
itself in © and the index distribution is strictly not Gaussian 
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